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1. INTRODUCTION 


Recently* considerable Interest has been shown in developing techniques for 
the classification of imagery data such as remote sensing data obtained using 
the multispectral scanner (MSS) on board the Landsat for inventorying natural 
resources, monitoring crop conditions, detecting mineral and oil deposits, 
etc. Usually, the inherent classes in the data are multimodal, and non- 
supervised classification or clustering techniques (refs. 1-3) have been found 
to be effective (refs. 4, 5) in the classification of imagery data. Cluster- 
ing the data partitions the image into its inherent modes or clusters* 
Labeling the clusters is one of the crucial problems in the application of 
clustering techniques for the classification of imagery data. 

Cluster labeling is similar to the problem of labeling the regions obtained by 
using segmentation algorithms in the development of scene understanding sys- 
tems. The recent literature shows considerable interest in the use of relaxa- 
tion labeling algorithms for labeling the segmented regions (refs, 6-8). 

These algorithms use relational properties of the regions through compatibil- 
ity coefficients. In cluster labeling, the relational properties of ths 
clusters are either not available or not meaningful. For example, in aero- 
space agricultural imagery, the regions of interest are crops, nonagricultural 
areas, etc. These can be anywhere in the image. Hence, it is not meaningful 
to define relational properties for the clusters. 

Most of the imagery data contain much spatial information, and several 
researchers (refs. 9-12) have attempted to use spatial information in the 
classification of imagery data. 

This paper documents an investigation of the problem of labeling the clusters 
using spectral and spatial information. It is assumed that the probability 
density functions and a priori probabilities of the clusters or modes are 
given. Let these respectively be p(X|n = i) and 6^ i = 1,2, •••,m, where 
m is the number of modes or clusters. It is also assumed that a set of 
labeled patterns X ^ ( j ) with labels o^(j) = i and their neighboring patterns 
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Y|f(j)(k * l,2,«*«,A;j * and i * 1,2,**%C) are given, where C is 

the number of classes. 


In remote sensing, the labels for the patterns are provided by an analyst 
interpreter (AI), who examines imagery films and uses other data such as 
historic information and crop calendar models. Very often the AI labels are 
imperfect. Recently, Chittineni (refs. 13-15) investigated techniques for the 
estimation of probabilities of label imperfections using imperfectly labeled 
and unlabeled patterns. It is assumed that the probabilities of label 
imperfections are available. Methods are developed in the paper for obtain- 
ing probabilities of class labels for the clusters using all the available 
information. 

This paper is organized as follows. In section 2, a relationship is developed 
between class conditional densities and cluster conditional densities in terms 
of probabilities of class labels for the clusters. Section 3 concerns the 
problem of obtaining probabilities of class labels for the clusters without 
using spatial information. Expressions are presented in section 4 for updat- 
ing the a posteriori probabilities of the classes of a pixel using spectral 
and spatial information from its neighborhood. Section 5 deals with the 
problem of obtaining probabilities of class labels for the clusters using 
spectral and spatial information. Imperfections in the labels of the given 
pattern set are considered in section 6. Section 7 contains the experimental 
results in the processing of remotely sensed imagery data, and the concluding 
remarks are given in section 8. In Appendix A, the problem of obtaining the 
probabilities of class labels for the clusters using information from a given 
set of labeled fields is considered. Contextual cluster labeling with the 
probability of correct labeling as a criterion is treated in Appendix B. 
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2. A RELATIONSHIP BETWEEN CLUSTER AND CLASS CONDITIONAL DENSITIES 


In this section, a relationship is developed between cluster and class con- 
ditional densities. In general, the class conditional density functions are 
multimodal. Let C be the number of classes and m be the number of clusters. 
Let p(X|u) = i) be the class conditional densities and p(X|ft = i) be the mode 
cr cluster conditional densities. Let P(w « i) and P(ft * i) be the a priori 
probability of class i and the a priori probability of cluster i, 
respectively. The mixture density p(X) can be written in terms of class 
conditional densities as 

P(X) s fl P(w s i)p(X|w = i) (1) 

1=1 

The mixture density p(X) can also be written in terms of mode conditional 
densities as 

m 

p(X) - £ p(n = t)p(X|f! = 4 ) 

4=1 

m c 

* £ p(X|« *) £ P(fl = = i) 

sl=1 i =1 

C m 

= £ P(«> = D £ P(n « *|u ■ l)p(X|n = 4) ( 2 ) 

i=l 4=1 

The following assumption is made from comparing equations (1) and (2). 

m 

p{X |w = 1) = £ P <n = *|w = 1)p(X|fl = i) (3) 

a=l 

Equation (3) can be rewritten as 

m 

p(w = i I X) = £ a ,p(J2 = *|X) (4) 

i=l 

where = P (a> = 1 |n = i) and is the probability that the label of mode % is 
class i. The probabilities satisfy the constraints given in equation (5). 
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» 

* 

I 


a^i > 0 5 i * l,2,»»»,C and % * 1,2, 

C (5) 

£ a H * 1 ; % * 1,2, •••>m 

i s l 41 

Equation (3) provides a relationship between class and cluster conditional 
densities in terms of probabilities of class labels for the clusters. 
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* 3. MAXIMUM LIKELIHOOD PROBABILISTIC CLUSTER LABELING 

This section concerns the problem of obtaining the probabilities (the 
probabilities of class labels for the clusters). It is assumed that we are 
given a set of labeled patterns X^(j) with class labels (j) * i; 
j * and i * 1,2,*»*,C. It is also assumed that the a priori proba- 

bilities of the modes or clusters and mode conditional densities are given. 

Let <5.j and p(X|fl * i) be the mode a priori probabilities and mode conditional 
densities* respectively. The criterion used in obtaining the probabilistic 
description of class labels for the clusters is the likelihood function. The 
likelihood of an occurrence of patterns X ^ ( J ) with their labels 
oj.(j) ■ i is given by 

C M i 

L{ = .q PCX 1 (J),« 1 (J) • i] (6) 
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* 


Since n II PCX * ( j ) ] is independent of w, (j), for mathematical simplicity* 
1*1 j a l 

dividing the above equation by it yields 


C fi pCX, (j) ,0). ( j) = i] 

4 ■ $ n plTj'dTT 


( 7 ) 


Noting that the logarithm is a monotonic function of its argument and taking 
the logarithm of L^ of equation (7) and using equation (4) yield the 
following. 


C ( m ) 

L « log(Li) = E £ log E a .p[a = 4|X,(j)] (8) 

1 1*1 j=l U= 1 Z1 1 ) 

The probabilities a satisfy the constraints given in equation (5). Closed- 
form solutions for o £lJ . by maximizing L of equation (8), subject to the con- 
straints of equation (5), seem to be difficult to obtain. The probabilities 
o £ . can easily be obtained using optimization techniques such as the Davidon- 
Fletcher-Powell procedure (refs. 16-18). 


£ 
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3.1 A FIXED-POINT ITERATION SCHEME FOR OPTIMAL « t1 

The following fixed-point Iteration equation (similar to maximum likelihood 
equations in parametric clustering in reference 3) for the solution of the 
above optimization problem can easily be obtained by introducing Lagrangian 
multipliers. That is: 


where 


Jtij 


a. . * 
Al 


E d 

■i-i 

c N i 

E E d 

i«l j»l 


j 


’Aij 


a Al p[« s A|X.(,j)] 
"m ~ 

£<* s jP[a * s |X i (d)3 

S B 1 


( 9 ) 

( 10 ) 


However, closed-form solutions for ca,i be obtained with the criterion 
as the maximization of a lower bound on L, and they are given in the next 
section. 


3.2 CLOSED-FORM SOLUTIONS FOR THE PROBABILITIES c ^. 

Since the logarithm is a convex upward function, we have the inequality 


log 


where 

and 


re i c 

I I, a.g^X) > £ a 1 1og[g.(X)] 

£ a, -'l 
i=l 1 


a^ > 0 ; i = 1,2,*«*,C 


(id 


( 12 ) 


Using the inequality of equation (11) in equation (8), a lower bound on the 
log likelihood function L can be obtained as 


N i m 


L > £ £ £ p[a = *|X f (j)]log(a ,.)] 
i=l j=l a=1 1 1 11 I 


( 13 ) 
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t? 



With the Introduction of the Lagranglan multipliers, the probabilities 
that maximize the lower bound of equation (13), subject to the con- 
straints of equation (5), can be obtained as follows. 


*Ai 


^ 1 e 1 Jt 

"C 

N r e rJt 


( 14 ) 


N i 

1 

where e U * IT " ^ I x i ( J ) 3 

1 j* 1 


(15) 


This solution simply states that the probability of the i^ class label for a 
given cluster I is the ratio of the sum of the a posteriori probabilities of 
cluster i given the labeled patterns from class i to the sum over all classes 
of the sum of a posteriori probabilities of cluster i given the labeled pat- 
terns from each class. Having obtained (the proportion of class i) 

can be estimated as follows. 

q.j = P(u a 1) 

m m 

* E P(m - 1,0 « *) - £ t.a * (16) 

t«l 4=1 * *’ 

A 

Hence, q^ (the estimate of q^) can be computed from the following. 


4. UPDATING A POSTERIORI PROBABILITIES OF THE CLASSES OF 
A PIXEL USING INFORMATION FROM ITS NEIGHBORHOOD 


The last section covered the problem of estimating the probabilities (the 
probabilities of class labels for the clusters) using information from a given 
set of labeled patterns. The probabilities a^. are seen to be functions of 
p[w.(j) = i | X ^ ( j ) ] ;the a posteriori probabilities of the classes and spatial 
information is not used in obtaining a^. Most of the natural imagery is 
abundant in spatial information and can be used to obtain better estimates 
fora^. In this section, expressions are developed for updating the a 
posteriori probabilities of the classes of a picture element (pixel) using 
information from its local neighborhood. These expressions are used in 
section 5 to obtain the probabilities of class labels for the clusters using 
both the spectral and spatial information. 

Lee the pixel under consideration be pixel 0. Its four neighbors in a two- 
dimensional local neighborhood are shown in figure 1. 


i i i I 

iiit 



y|(J) 

1 


Y,(i) 

4 

X^J). 

«j(j) 

0 

Y?U> 

2 


Y.(j) 

3 



Figure 1.- Four neighbors of a pixel 0. 




The following a posteriori probabilities of the classes of a pixel 0 are 
obtained by using information from its local neighborhood. 


P^U) - k|X t (j)>Y}(j),-*.,Y}(j)] 


p [^ (j) s k ,X i ( j ) *Y?-( j ) , • • • j )] 

pj^. ( j} ,y| (j) , *,y| (j) 


The denominator of equation (18) can be written as 

C 

L 

k-l 


ju^(j) = k,x^(j),v|(j) j*- 


c c 

_ 

= , £ ••• p 

( j ) ~ k *X ^ ( j ) i 

kj*l k 4 =l 

c c 

x.j(j) »v|(j) »••• , 

■ •** £ p 

kj®l k 4 -l 

P^(j) = k,uj(j) = k 1 ,*«‘,u>*(j) 


( 18 ) 


■ E p[» f (J) - k,x i (j),vj(j),— a?) 

* k-l 

Similarly, from the numerator of equation (18), we obtain 


( 20 ) 


where C is the number of classes. 

In the following, it is assumed (a) that the probability density function of a 
pattern, given its label, is independent of other patterns and their labels 
and (b) that the labels of the patterns are independent of the labels of their 
nonneighbors. In the following analysis, the pixels having a common side are 
considered as neighbors. (For example, in figure 1, pixels 0 and 1 are 
neighbors, whereas pixels 1 and 2 are nonneighbors.) By repeatedly using 
assumption (a), the following is obtained. 



i 

r 


p[x 1 (j),Y}(J),— .Yf(j)|w 1 (J) = k,ojj(J) = = k 4 j 

- p[M j >h (j ) = k » Y i(j)» w j(j) - k 1 ,---,Y|(j) > 4(j) * k 4 ] 
p[y|(j) >••• »y|(j) | w.(j) = k,u>*(j) = k 1 ,«*%»!(j)]* k 4 
“ PCX i (j)|w i (j) = k]p^Y|(j),»»«,Y^(j) |u>. (j) = k,«J(j) = k x , • * • ( j ) 

= pcx^i^u) - kijn p[Yj(j)i«f(j) - k A ]J 

By repeatedly using assumption (b), the second term in the summations of 
equation (20) can be written as follows. 

p [ w i (j) - k,to|(j) = kj,*»* = kj 

= P[^ f ( j ) = k]p[o)| (j) = k 1 ,*--,wj(j) = k 4 |w.(j) = k] 

= PC^tj) - k]p[u|(j) = k 1 |» 1 (j) = k,^(j) = k 2> — ,^(j) • k 4 ] 

p [“i (j) = k 2 ,'”,^(j) = k 4 1^ (J) = k] 

* p C“j(j) = io|n p[«i(j) - kji^tj) = k]| 



( 21 ) 


( 22 ) 
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Using equations (21) and (22) in equation (20) results in 

p[<o,(j) * k,x.(j),y-(j).— ,v^j)] 

C C 

= PCu.Cd) 3 k]p[X.(j)|w.(j) « k] £ ••• £ 

1 kj=l k 4 =l 



3 P[^ i (j) = k]p[X. (j)|w.(j) * k] 



From equations (18), (19), and (23), we obtain 
p[»,(j) ■ .v-(j)] 



In equation (24), the spectral and spatial information from the neighborhood 
of a pixel is used in obtaining the a posteriori probabilities of its classes. 
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5. PROBABILISTIC CLUSTER LABELING WITH SPECTRAL AND SPATIAL INFORMATION 


This section covers the problem of obtaining the probabilities a^. (the 
probabilities of class labels for the clusters) using spectral and spatial 
information. It is assumed that we are given a set of labeled patterns 
X ■ (j) with labels w..(j) -- i and their neighbors Y*(j), i = 1,2, ••*,4, as 
shown in figure 1. For j = 1,2,**», and, i = 1,2, the likelihood 
of occurrence of patterns X ^ ( j ) with labels ^(j) = i and with 
Y*(j), i = 1,2, •••,4, as their neighbors is given as 

- n n |p[x i ( j ) ,ti> i ( j ) = i,Y?(j),«*«,Y^(j)]J ( 25 ) 


From equations (24) and (25), the log likelihood function can be written as 


L = 



£ £ 


i=l j=l 


log{p[a).(j) = i |X i (j)l> 


N. 


Z E L log 

i=l j=l i=l 


4 




Using equation (4) in equation (26) yields 


N i 


C im 

L = E E WE a ri P^ = r|X. (j )] 1 
i=l j=l |r=l n 1 


C N i 4 

+ £ £ £ 
i=l j=l l=\ 





) = 1 0 l^i u; = 1 ( rn r n 

«*(J> = i A J (r=l 1 J 


( 27 ) 


Closed-form solutions for the probabilities c* ri that maximize L of 
equation (27), subject to the constraints of equation (5), seem to be diffi- 
cult. Optimization methods (refs. 16-18) such as the Davidon-Fletcher-Powell 
procedure can easily be used to obtain probabilities a ri that maximize L of 
equation (27), subject to the constraints of equation (5). By introducing 
Lagrangian multipliers, the following fixed-point iteration equation for the 
solution of the above optimization problem can easily be obtained. That is, 



(29) 


If the spatial information is not used (that is, when 6^ = 0), it is easily 
seen that equation (28) becomes identical to equation (9). 
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6. CLUSTER LABELING WHEN THE LABELS OF THE 
GIVEN PATTERN SET ARE IMPERFECT 


» 

r 


In practice, such as in the classification of remotely sensed, MSS imagery 
data, it is difficult and expensive to obtain labels for the training pat- 
terns. The labels for the patterns are usually provided by an AI who examines 
imagery films and uses some other information. (For example, in labeling 
pixels of remote sensing agricultural imagery, the information that is most 
often used is historic information, crop growth stage models, etc.) These 
labels are very often imperfect. Recently, there has been considerable 
interest (refs. 13-15) in estimating the probabilities of label imperfections 
and using these estimates to obtain the improved classification and to 
identify mislabeled patterns with a specified degree of confidence. This 
section pertains to the problem of probabilistic cluster labeling by taking 
into acount the imperfections in the labels of the given labeled pattern set. 
Let oj and w 4 be the perfect and imperfect labels, respectively, each of which 
takes values 1,2,**»,C. The imperfections in the labels are described by the 
probabilities 

3^ = P(w' = i | w = j) (30) 

C 

where £ ^ s 1 (31) 

i=l 


To obtain a relationship between class conditional densities with and without 
imperfections in the labels, consider 


p(X|a>' = i) = 


1 

P(w'" = 




C 

£ 

j=i 


P<X,ui‘ = i,u> = j) 



p(X|a>' 


i,u> = j)P(w' 


i 1“ = d)P(w = j) 



3j i P(to = j )p(X| a) = j) 


(32) 
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where it is assumed that 


p(X|w* = i,w = j) = P (X | w ■ j) 


(33) 


Using the Bayes rule, from equation (32), we obtain 


p(u)* 


1|X) 



3j i P(w = j|X) 


(34) 


In the following, it is assumed that a set of labeled patterns X ^ ( j ) with 
imperfect labels wi(j) = i and with the neighbors Y?(j) ,»**,Y^(j) as shown in 
figure 1 for j = 1 ,2 ,• • • ,N| and i = 1 ,2 , • • • ,C is given. It is also assumed 
that the probabilities of label imperfections a - are available. The probab- 
ilities of imperfections in the labels being a-.-, the likelihood of the 

J J 1 4 

occurrence of patterns X^(d) with imperfect labels w!(j) = i, Y*(j),**«,Y!j(j) 
being their neighbors is given by the following 


L* 


C N i 

= ft n pfw'.(j) = i ,x.(d) ,yj(j) » 

i=l j=I L 1 1 1 



(35) 


Consider 

p[“)(j) 3 k,x i (j),Y}(j),...,Y|(j)] 
c c 

* k ^ 1 ••• k E 2 p[-S<4) = k.x 1 (j),vJ(j).«j<j) = ^]i » »Y^(j) ,U|( j) = k 4 J 

c c 

= k E x p[x^(j)»v|(j)i** r ,Y|(j)|(i)l(j) = k,<*>j(j) = ,a»|(j) = kj 

p[uj(j) = k,ai!(j) = k j , • • • , oif(j) = kj 


(36) 


IS 
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Given the probabilities of imperfections in the labels and proceeding simi 
larly to equations (21) and (22) while using assumptions (a) and (b) of 
section 4, the following can easily be obtained. 


pjyjhyju),— ,v!}(j)Wj<j> 


m|U) ■ kj --.wf (j) = k 4 j 

‘ pCXj (j) ( j) » ujll p[y*(J)I»*(J) ■ kjj 


(37) 


and 


p[w](j) = k.,uj(j) = = k 4 ] = PCo)t(j) * k] 

Jn p[«*(J) s k A l«j(j) = k 


(38) 


Since pCX^j)]^^ p[Y*(j) j is a constant, dividing equation (35) by it and 


using equations (37) and (38) in equation (35) yields 

pI 


wi(j) = k,x 1 (j),Y{(j>,...,yJ(j) 


pCX^)]^!! p y](j)| 


= pCuj(j) = k i x i ( j ) ] n ( E p[ w i(i) = k A ,|yf(j 

■t = l \ k £ ~1 


= kJujU) » k" 


P “-(J) ■ k t 


p[u)t(j) = k|X i (j)] ^4 
pUlfi) = kl i 


where 


f— 7~ t n ] £ p[ w ?(J> = k J Y ?^l v k i 

k-(j) = kl £-1 (k A -l L 1 * 1 J V 

P[to> 1 (J) = s] 


»i, i = £ p S] - -nr: t p KU) = k Mj) = si 

V s=l 51 p[«*(j) = kj L J 


(39) 


(40) 


/& 
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and it is assumed that 

8 sk * p [ w j(^ “ M«i(d) B sj 

a P[wj(j) * kfo).(j) = s,wj(j) * k^J (41) 


Since the logarithm is a monotonic function of its argument, taking the loga- 
rithm of equation (35), using equation (39) in equation (35), and treating a 
priori probabilities of the imperfect labels as constant, the log likelihood 
function becomes 

C N i 

L - E E log 

i=l j=l 


r 

|p[»)(j) * i |X i { j) 


N i 4 


fi k k lo fe - k * |Y H (42) 


Using equations (4) and (33) in equation (42) yields 

N 4 

i 

¥ 


C ^i ( m C ) 

L s £ E log E E a rs 8 si pCn * 

i=l j=l (r=l 5=1 rs S1 1 ) 


C N i 4 


m C 


v i *t i ill u r 

+ E £ E log j E E a r k v k i p r = 

i=l j=l £=1 ( r=l k„=l rK * L 


r|vf(j) 


(43) 


Optimization methods such as the Davidon-Fletcher-Powell procedure (refs. 16- 
18) can easily be used to the obtain optimal a uy that maximizes L, subject to 
the constaints of equation (5). Also, fixed-point iteration equations similar 
to equation (28) can easily be derived to obtain the optimal a yv by introduc- 
ing Lagrangian multipliers and are given in the following. 



( 44 ) 


6-4 


n 


where 


i c ^ 


3 vi p[fi * ujX i (j)] 


£ “rs 8 5l^ n ’ r l x i <J>3 
r=l s=i 


(45) 


and 


uv 


C N 1 4 

E EE 

1=1 j»l A*! 


v v1 P_« ■ u|V*(J)_ 


£ £ a rk v k ip[ n " r l y i(^l 

m kT=l rK « V L 1 J 


(46) 


If the spatial Information is not used, the fixed-point iteration equa- 
tion (44) for obtaining a uv , the probabilities of class labels for the 
clusters become the following. 


where 


a 


uv 


C N i 

Y' j 

h >i j J vu 

l c 3 

5 £ £ 


a uAi p C 8 BU l x i(J)] 

d i j vu C rrT 

E E \c3e4pCfl “ k |X • (j)] 

s=l k -1 S1 1 


<47) 


(48) 
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7. EXPERIMENTAL RESULTS 


This section presents some results obtained in the processing of remotely 
sensed Landsat MSS imagery data. The objective of the processing is to 
estimate the proportion of the class of interest In each image. There are two 
classes in the image. Cl ass 1 is wheat, and class 2 is nonwheat, which is 
designated as “other." The class of interest is wheat. The MSS Images of 
several segments were processed in the following manner. [A segment is a 9- 
by Il-kilometer (5- by 6-nautical-mile) area for which the MSS image is 
divided into a rectangular array of pixels, 117 rows by 196 columns,] The 
image is overlaid with a rectangular grid of 209 grid intersections. 

Class labels were given to the pixels corresponding to a subset of 209 grid 
intersections by an AI who examined che imagery films and used some other 
information such as crop growth stage models and historic information. These 
are imperfect labels. Also, ground-truth labels or true labels of these 
pixels are acquired. 

The numbers and locations of the segments, the number of pixels labeled, and 
the number of features or the number of channels used for each segment are 
listed in table 1. Several acquisitions were used for each segment. The 
Gaussian mode (cluster) conditional densities and a priori probabilities of 
the inherent modes in the data of each segment are obtained using a maximum 
likelihood clustering algorithm (refs. 3, 19). The number of clusters 
generated for each segment is listed in table 1. The theory developed in 
sections 3 and 5 is applied in estimating the probabilities of class labels 
for the clusters of each segment using Al-labeled patterns and ground-truth- 
labeled patterns, both with and without the use of contextual information, 

The proportion of class 1, the class of interest, is estimated for each 
segment using equation (17) for all the cases, and, the estimates are listed 
in table 1. The proportion of class 1 of each segment based on true [ground 
truth (GT) ] labels of all the pixels in the segment is listed in the last 
column of table 1. In equations (28) and (29), the following a priori and 



TABLE 1.- PROPORTION ESTIMATION THROUGH CLUSTER LABELING WITH AND WITHOUT 

THE USE OF CONTEXTUAL INFORMATION 




* 


* 


transition probabilities are used, where w 
and Is the label of Its neighbor. 

P(u>£ * 1) * 0.5 
P(u N * J/wq 3 1) w 


c Is the label of the central pixel 
; 1 * 1,2 \ 

(0.8 If 1 3 j i (49) 

I 0.2 If 1 * j ) 


From table 1, It Is seen that considerable Improvement has been made In the 
proportion estimates with the use of contextual Information If the labels 
are good. 

The probabilities of label Imperfections of AI labels or the e-matrlx are 
estimated for each segment by comparing Imperfect (AI) labels and perfect 
(ground- truth) labels. These are listed in table 2. From tables 1 and 2, It 
Is observed that, when the imperfections in the labels are small, the use of 
contextual Information with the AI labels resulted in Improved proportion 
estimates (see segment 1231). 

Equations (44), (45), and (46) are used with the AI labels and the corre- 
sponding ^-matrix for estimating the probabilities of class labels for the 
clusters. The values used for a priori and transition probabilities in these 
equations are given in equation (49). ,,ie resulting proportion estimates are 
listed in column 5 of table 2. The proportion of wheat in each segment is 
also estimated using equations (47) and (48) with the AI labels and the 
corresponding 3-matrix. The resulting proportion estimates are listed in 
column 6 of table 2. From table 2, it is seen that there Is considerable 
Improvement in proportion estimates when the probabilities of label 
Imperfections are taken into account. 
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TABLE 2.- ESTIMATION OF PROPORTION OF CLASS 1 WITH AI LABELS AND 0-MATRIX 


Segment 

location 

(county, 

state) 

No, of Al'labeled 
patterns 

Computed 

B "matrix comparing 
AI and GT labels 

Proportion 

estimate 

using 

eqs. (44), (45), 
and (46) 

Proportion 
estlflMte 
using 
eqs. (47) 
and (48) 



Proportion 
estimate 
directly with 
eq. (9) 

GT 

proportion 

Wheat 

"Other" 

1005 

Shonnan, 

Texas 

20 

77 

r 0.5455 
L 0.0308 

0.45451 
0.9692 J 

0.3227 

0.3025 

0.2456 

0.348 

1060 

Cheyenne, 

Colorado 

17 

89 

f 0.5667 
1 0.0263 

0,4343] 
0.9737 j 

0.2174 

0.2172 

0.1975 

0.231 

1231 

Jackson, 

Oklahoma 

71 

25 

[ 0.9315 
10.1304 

0.0685] 
0.8696 J 

0.6921 

0.7139 

0.6265 

0.744 

1520 

Big Stone, 
Montana 

20 

71 

f 0.7917 
10.0149 

0.2083] 
0.9851 j 

0.2432 

0.3647 

0.2109 

0.301 

1604 

Renville, 
North Dakota 

31 

70 

f 0.4600 
L 0.1569 

0.5400] 
0.8431 J 

0.4937 

0.4814 

0.2963 

0.524 

1675 

Mcpherson, 
South Dakota 

10 

97 

T 0.2667 
L 0.0390 

0.7333] 
0.9610 J 

0.2142 

0.2156 

0.1085 

0.291 

1805 

Gregory, 
South Dakota 

15 

129 

f 0.4211 
L 0.0640 

0.5789] 
0.9360 J 

0.1569 

0,1932 

0.1181 

0.164 

1853 

Ness, 

Kansas 

24 

67 

T 0.8077 
10.0615 

0,1923] 
0.9385 J 

0,3021 ■ 

0.3052 

0.3246 

0.306 

Stas 






0.3271E-01 

0.1441E-01 

0.9763E-01 


Mean square error 





0.1682E-02 

0.I947E-02 

0.1514E-Q1 





8. CONCLUDING REMARKS 


In the classification of Imagery data such as in the machine processing of 
remotely sensed MSS data, unsupervised classification techniques have been 
found to be effective. Clustering of the data partitions the image Into its 
inherent modes. Labeling these clusters Is one of the crucial problems in the 
application of clustering techniques for the classification of Imagery data. 

In the analysis of remotely sensed data, labels for the training patterns are 
usually provided by an AI who examines the imagery films and uses ancillary 
Information such as historic Information and crop growth stage models. These 
labels are usually Imperfect. Most of the Imagery data are abundant in 
spatial content, and spatial information improves the classification by 
machine processing. 

In this paper, the problem of obtaining the probabilities of class labels for 
the clusters is considered. It is assumed that a set of labeled patterns 
X.j ( j ) with class labels Wj(j) = i and their neighbors Y?(j)U = 1,2, ,4; 

j = 1,2, ,N.; and i - 1,2, ,C) are given, where C is the number of 

classes. The probabilities of imperfections in the labels are assumed to be 
available. It is also assumed that the number of inherent modes in the data, 
mode conditional densities, and a priori probabilities of the modes are given. 
Expressions are developed for obtaining the probabilities of class labels for 
the clusters using all the available information. 

Experimental results are obtained from the processing of remotely sensed MSS 
imagery data. One of the important objectives in the analysis of remotely 
sensed data is to estimate the proportion of the crop of interest. In 
estimating the proportions through cluster labeling, use of contextual 
Information resulted in better estimates when the imperfections in the labels 
are small. Furthermore, the use of probabilities of label imperfections 
resulted in better proportion estimates through cluster labeling. 
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APPENDIX A 

PROBABILITY CLUSTER LABELING WITH FIELD STRUCTURE 


In the practical applications of pattern recognition, such as in the classifi- 
cation of remotely sensed agricultural imagery data, one of the difficult 
problems is obtaining labels for the training patterns. Hie labels for the 
training patterns are usually provided by an analyst-interpreter who examines 
imagery films and uses other information such as historic information and crop 
calendar models. 

It has been observed that the field-like structures that are normally in agri- 
cultural imagery are relatively easy to label in comparison to the pixels. 
Recently, considerable interest has been shown in developing techniques for 
locating fields in the imagery data (ref. 20) and for developing maximum like- 
lihood clustering algorithms (ref. 21) to fit the mixture of Gaussian density 
functions by taking the field structure of the data into account. These 
algorithms typically give the a priori probabilities and Gaussian cluster 
conditional densities for the inherent modes in the data. The situation is 
illustrated in the following figure. 


o - fields 
\ 

n / - clusters 
/'•'V 

V ) — classes 

Figure A-l.- Illustration of fields, clusters, and classes in an image. 

It is the purpose of this appendix to consider the problem of obtaining the 
probabilities of class labels for the clusters using information from a given 
set of labeled fields. It is assumed that a set of labeled fields from each 
class is given. Let Fj(i), j = l,2,»«»,f(i) be the labeled fields of class i, 

A-l 



/ 



i * 1,2,*»»,C. Let Nj(i) be the number of pixels in the labeled field of 
class i. Let Xj|<(i) be the spectral vector of the k^ pixel of labeled 
field of class i. Let Xj(i) be the concatenated vector of spectral vectors of 
pixels in the labeled field of class i. That is 


Xj(T) 


Xjl ( 1 ) 






(A- 1 ) 


It is also assumed that the probability density functions and a priori 
probabilities of the clusters are given. Let these be p(X|n = i) and 
<5 ^ ,i = 1,2, •••,m, respectively, where m is the number of clusters. Assuming 
the fields are independent, the likelihood of occurrence of X j ( i ) with their 
labels aj,(i) = i, but normalized, is given by 

J 




C f ( i 

= n n 


) p[X i (i),a).(i) = i] 


C f(i) 


* i|Xj(i)] 


(A-2) 


If X is a concatenated vector of spectral vectors in a field, similar to 
equation (4), we have 

m 

p(« = i < X) = ^ (A-3) 

Using equation (A-3) in equation (A-2), the log likelihood function can be 
witten as 

L = log (A) 

C f(j) ( m \ (A-4) 

= E E log E o-.-pCo * £|x.(i)] 

i-1 j^l U-l 3 ) 




A-2 


A fixed-point iteration equation for the probabilities of class labels for the 
clusters that maximize l of equation (A-4), subject to the constraints of 
equation (5), can be written from equations (9) and (10) as 


# 


where 



(A-5) 



m 

£ » s1 pCn * s|Xj(i)] 


(A-6) 


But from the Bayes rule, we have 


P(n « «-)p[X i (i ) (n = *3 
p[n = *|Xj(i)] ■= ptjnry] 

J 

P(n = *)pCX.j(l)|ft * a] 
m 

£ p(n = s)p[x.(i ) \a = s] 
s=l J 


(A-7 ) 


The computation of a posteriori probabilities of the clusters p[n = ^ ( X - ( i ) !] 

J 

can be considerably simplified by noting that the sequence [X • (i ) ,S • (i ) , 

J J 

j = 1 , 2 • ,f (i ); i = 1,2,«**,C] is a sufficient statistic for the criterion, 
where X,(i) and S , { i) are the sample mean and the sample scatter matrix of the 

J J 

j 1 -* 1 field of the i*^ class, respectively. That is 


Nj(i) 

'j (i) "OTT k ?! x jk<° 


N j (1) t 

S j (1) ‘ k ? x [ Cx jk (1) - x j< i > ]l V 1 > * x jU)l 


(A-8) 


and 



The sufficiency of the sequence [X .(1),S .(1)3 implies that 

J J 


P(a ■ *)pCXj(l)|a - *3 i,q 1 [i i (l),s i l 1 )] 

pTXJTTTT 


q[Xj (i ) ,Sj (1 )] 


(A-9) 


where CX j (i ) ,S - (i ) 3 Is the joint density of Xj (1 ) and Sj(1), given that- the 
cluster * contains the field Fj(1) and 


m 


q[Xj(i),Sj(i)] = £ S^CXjOKSjO)] 
& 1 


( A-10) 


If p(xl« = a) ~ N(u 4 ,e a ) , the joint density q^[X j (i),Sj(i)] can be expressed 
as 


q.CXjtD.Sjd)] - N d [xj(0; M t . Ti-iry ^] w dCSj(i); Nj<0 - 


>• 


(A-ir) 


where N d ^X j ( i ) ; u^, sj i s the d-variate normal density of Xj(i) and 

W d [Sj(i); N j ( 1 ) - 1, E^] is the Wishart density of Sj(i) with Nj(1) - 1 degrees 
of freedom. It can easily be shown that the density of sample mean X.(i) is 

J 

given by 


pCXj(Dla - *3 ~ TCTTT **) 


( A- 1 2 ) 


The Wishart density of S 4 ( i ) with N-;(i) - 1 degrees of freedom can be written 


as 


wJsj (1)^(0 - 

LJ J J 2 [1/2][N 


i (1/2) j[N. (i ) - 1] - d - lj . . _ 

|Sj<i) | 1 J exp f ? tr [ s j (i) ^t 1 ]} 


jdl-lM n d(d-l) |4 | £( |l/2CNj(1)-l]r^ r (l. j [N . (1) . „ «. , . 0 j) 

Lu=l 


(A-13) 


A-4 


so 


Using equations (A-12) and (A-13) in equation (A-9) yields 


p<n » t)q t CXj(l),SjH)] S t{|£ 

i-NjID 

: r l“T — •* p 

" 

? tr ( t i 1 | s ji ,> + NjimSjdi - - »,i T j)]J 



ji£ r ! <? exp 

+ KjdltSjH) - » r l[«j(0 - n/j) 

r 


(A-14) 


Equation (A-14) can be used in equaitons (A-5) and (A-6) to obtain optimal 
probabilities of class labels for the clusters using information from a given 
set of labeled fields. 
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APPENDIX B 

CONTEXTUAL CLUSTER LABELING WITH THE CRITERION OF PROBABILITY 

OF CORRECT LABELING 

The problem of obtaining the optimal probabilities of class labels for the 
clusters using the criterion of probability of correct labeling is formulated 
in this appendix. It is assumed that a set of patterns X^(j) with imperfect 
labels ai *. ( j ) * i and with the neighbors as shown in figure 1, 

for j * and i * 1,2,«**,C, are given. The probabilities of label 

imperfections are assumed to be available. It is also assumed that the 
probability density functions and the a priori probabilities of the clusters 
are given. If a pattern X with the neighbors Y*,***,Y 4 comes from class 1, 
then for particular a priori probabilities and probability densities of the 
classes the probability with which it is correctly classified into class i is 
p(w s i j X, Y ,**» ,Y ) . Since logorithm is a monotomic function of its argu- 
ment, the criterion of probability of correct labeling (PCL) may be defined, as 

0 

P CL = E P(<*> = i) / logCp(o) = i|X,Y 1 ,**»,Y 4 )]p(Xjw = i)dx (B-l) 
i-1 


Let 0 be the matrix of probabilities of label imperfections, where 




P - C^-j] 

(B-2) 


Let 

v = (p 1 )" 1 

(B-3) 


Using equations (B-2) and (B-3) 

and inverting equation (32) results in 


4 

P(w = i)p(X|w = i 

) = £ = 0) P(X| o>* = j) 

j=l J 

(B-4) 


22 


B-l 



From equations (B-l) and (B-4) we get 


P 


CL 


c c 

Pi U U F ^“' ” ^ 1og ^P(“ “ l|X,Y 1 ,-*-,Y 4 )Dp(Xito* 


J)dX 


<B-S) 


Using the given imperfectly labeled patterns and their neighbors, an estimate 
for Pql of equation (B-5) can be written as 



(B-6) 


Substituting sample estimates for P(w‘ = j) and using equations (24) and (4) 
in equation (B-6), the criterion can be written as 



C C 

+ E E 

U=1 J=1 


C 4 ( C m 0 

uj E E log) E E a r i< h u PC n = r l Y i(k)] 

UJ k=l a= 1 )k 8 =l r^l V 3 


where 


■k t u 


i M i (k) : M M i ( k > * u 


L “j( k ) ■ k . 


(B-7) 


( 8 - 8 ) 


The probabilities that maximize Cr of equation (B-7) and that are subject 

to the constraints of equation (5) can be obtained using optimization tech- 
niques such as Davidon-Fletcher-Powell (refs. 16-18). However, fixed-point 
iteration equations are developed in the following. 


8.1 FIXED-POINT ITERATION SCHE^ FOR OPTIML c i. 

It is noted that in equation (B-7), might be negative. In the following 
fixed-point Iteration equations for obtaining optimal * the probabilities 
of class labels for the clusters are developed. Consider 


p(X|u> = i) 


where it is assumed that 

p(X|u) 


£ p(x»w * i,0) * j) 

J S 1 


C 


£ 0 n p ( x | £0 ' ■ j) 
j=i J 

(B-9) 

i, u)' *• j) « p(X|cd' ■ j) 

( B- 1 0 ) 


In terms of probabilities of label imperfections, the a priori probabilities 
of perfect and imperfectly labeled classes are related as 


P(w' = i) « £ = JJ 


J-l 


(B-ll) 


Inverting equation (B-ll), we get 


P(« ■ 1) ■ £ v M P(o)' - j) (B-12) 

j=l lw 

Using equation (B-9) in equation (B-l), an estimate for P C [_ can be written as 

C C / \ 

f CL ' 2 £ n ij £ |p[- - HX j (k),V j 1 (k) > ..- > Yj(k)]j (8-13) 


'T J 


- = i ^id 


"7 


where 


(B-14) 



From equations (4), (24), and (B-13), the criterion becomes 


C C JJj ( m 

* E E n ii L log £ a r iP^ s 

i=l j=l k=l ( r=l ri 


Cr 


r|Xj (k)D' 


m 


C C 4 

E E n,H t E i°g{X. E 


imJ i—J 111 * J LmJ 

u=i j=i UJ k=i i=i 


V 1 r=1 




S2 = 


r|Yj(k)J 


{ B-15) 


The following fixed-point iteration equations for obtaining optimal a ^ that 


maximize Cr of equation (B-15), subject to the constraints of equation (5), 
can easily be obtained by introducing Lagrangian multipliers. That is 


a 


a . ( 5* . + £ . ) 
ri \ n r i / 


n 


£ “ri( 5 ri + 5 ri ) 


( B-16) 


where 


N, 


ri 


■ £ "ii t 


p[n = r | X j (k ) ] 


fa U ,fa m 

J " 1 E “si P ^ 8 * s |Xj (k)] 


(B-17) 


and 6 


C C "i 
i - £ L n„, ± 


Nj 


ri 


) ■ r|Y*Ml 


«,v(- =!->,] 


( B- 18) 


If the spatial information is not used, the fixed-point iteration equations 
for obtaining the optimal probabilities a .. become the following. 


Vi “ T 


a • 5* . 

ri ri 


£<v/ri 


(B-19) 


B-4 


at 


Where 6^ is given by equaion (B-17). It is noted that when there are no 
imperfections in the labels, equation (B-19) is identical to equation (9). 

B.2 EXPERIMENTAL RESULTS 

This section presents some results from the processing of remotely sensed 
multi spectral scanner imagery data. The objective of the processing is to 
estimate the proportion of class of interest through probabilistic cluster 
labeling. The class of interest is wheat and its proportion is estimated 
using equation (17). The same labeled patterns and the cluster statistics of 
section 7 are used. The a priori probabilities of imperfectly labeled classes 
for use ir? equation (B-12) are estimated as sample estimates. The a priori 
and the transition probabilities used in the local neighborhood of the given 
labeled patterns are given in equation (49). The results are listed in 
table B-l. From table B-l, it is seen that better proportion estimates are 
obtained by taking the imperfections in the labels into account. 
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TABLE B-l.- ESTIMATED PROPORTION OF CLASS 1 WITH IMPERFECT LABELS AND 3-MATRIX 


Segment 

Location 

(county, 

state) 

Computed 

0 -matrlx comparing 
A. 1. and G.T. labels 

Proportion 
estimate 
using eqs. 
(8-16) , (8-17) 
and (B-18) 

Proportion 
estimate 
using eqs. 
(B-19) and (B-17) 

Proportion 
estimate 
directly with 
Al labels 
using eqs. (9) 

G.T. 

proportion 

1005 

Sheman, 

Texas 

f 0.5455 
L 0.0308 

0.45451 
0.9692 J 

0.3194 

0.3641 

0.2456 

0.348 

1060 

Cheyenne, 

Colorado 

r 0.5667 
L 0.0263 

0.43431 
0.9737 J 

0.2297 

0.2787 

0.1975 

0.231 

1231 

Jackson, 

Oklahoma 

r 0.9315 
LO.1304 

0.06851 
0.8696 J 

0.7640 

0.7546 

0.6265 

0.744 

1520 

Big Stone, 
Montana 

T0.7917 

10.0149 

0.20831 
0.9851 J 

0.2398 

0.2661 

0.2109 

0.301 

1604 

Renville, 
North Dakota 

f 0.4600 
L 0.1569 

0.54001 
0.8431 J 

0.4931 

0.5035 

0.2963 

0.524 

1675 

McPherson, 
South Dakota 

r 0.2667 
L 0.0390 

0.73331 
0.9610 J 

0.2681 

0.2448 

0.1085 

0.291 

1805 

Gregory, 
South Dakota 

f 0.4211 
10.0640 

0.57891 
0.9360 J 

0.1502 

0.1385 

0.1181 

0.164 

1853 

Ness, 

Kansas 

f 0.8077 
10.0615 

0.19231 
0.9385 J 

0.2769 

0.3164 

0.3246 

0.306 

Bias 

Mean square error 

0.20350E-01 

0.89969E-03 

0.52875E-02 

0.89725E-03 

0.9763E-01 

0.1S14E-01 
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